SpamBayes: Effective open-source, Bayesian based, email classification system

نویسندگان

  • Tony A. Meyer
  • Brendon Whateley
چکیده

This paper introduces the SpamBayes classification engine and outlines the most important features and techniques which contribute to its success. The importance of using the indeterminate ‘unsure’ classification produced by the chi-squared combining technique is explained. It outlines a Robinson/Woodhead/Peters technique of ‘tiling’ unigrams and bigrams to produce better results than relying solely on either or other methods of using both unigrams and bigrams. It discusses methods of training the classifier, and evaluates the success of different methods. The paper focuses on highlighting techniques that might aid other classification systems rather than attempting to demonstrate the effectiveness of the SpamBayes classification engine.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Log File Filtering with Off-the-shelf Naïve Bayesian Content Filters

As computer systems become more complex, the state of their inner workings become more and more important to the system administrators working to keep them running. Log files provide much needed visibility into these systems, whether they are hardware, operating systems or applications. Unfortunately, systems can easily create overwhelming amounts of data for administrators to comb through. Thi...

متن کامل

A Study of Supervised Spam Detection applied to Eight Months of Personal E-Mail

In the last year or two, unwelcome email has grown to the extent that it is inconvenient, annoying and wasteful of computer resources. More significantly, its volume threatens to overwhelm our ability to recognize welcome messages, and hence to destroy our trust in email as a reliable communication medium. An automatic spam filter can mitigate these problems, provided that it acts in a reliable...

متن کامل

Machine Learning in the Presence of an Adversary: Attacking and Defending the SpamBayes Spam Filter

Machine Learning in the Presence of an Adversary: Attacking and Defending the SpamBayes Spam Filter

متن کامل

2 Misleading Learners: Co-opting Your Spam Filter

Using statistical machine learning for making security decisions introduces new vulnerabilities in large scale systems. We show how an adversary can exploit statistical machine learning, as used in the SpamBayes spam filter, to render it useless—even if the adversary’s access is limited to only 1% of the spam training messages. We demonstrate three new attacks that successfully make the filter ...

متن کامل

Exploiting Machine Learning to Subvert Your Spam Filter

Using statistical machine learning for making security decisions introduces new vulnerabilities in large scale systems. This paper shows how an adversary can exploit statistical machine learning, as used in the SpamBayes spam filter, to render it useless—even if the adversary’s access is limited to only 1% of the training messages. We further demonstrate a new class of focused attacks that succ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004